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To all whom it may concern: 



Be it known that we, Eugene M Fluder, 
Richard D. Hull, 
Simon K, Kearsley, 
Robert B. Nachbar, 
Robert P. Sheridan, and 
Suresh B. Singh 

have invented certain new and useful improvements in 
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of which the following is a full, clear and exact description: 
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CHEMICAL STRUCTURE SIMILARITY RANKING SYSTEM 
AND COMPUTER- IMPLEMENTED METHOD FOR SAME 

RELATED APPLICATIONS 
5 This application claims priority to U.S. Provisional 

Application Serial No. 60/128,473, filed April 9, 1999 and 
incorporated herein by reference. 

% FIELD OF THE INVENTION 

This invention relates, in general, to computer-based 
B calculation of compounds, compositions, mixtures, and/or chemical 
1^ structure similarity and, in particular, to the ranking of 
compositions, mixtures, and/or chemical compounds, mixtures and/or 
□ compositions compounds in databases, such as chemical databases, 
15 by their similarity to a user's probe compound (s) . 
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BACKGROUND OF THE INVENTION 
Pharmaceutical companies, for example, have large collections 
of chemical structures, compounds, or molecules.. One or more 
employees thereof will find that a particular structure in the 
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collection has an interesting chemical and/or biological activity, 
for example, a property that could lead to a new drug, or a new 
understanding of a biological phenomenon. 

Similarity searches are a standard tool for drug discovery. 
5 Given a compound with an interesting biological activity or 
property, compounds that are structurally similar to it are likely 
to have similar activities or properties. In practice, an 
investigator provides a probe and searches over a database of 
if compounds to find those which are similar. He then selects some 
B® number of the similar compounds for further investigation. 
^ Chemical similarity algorithms operate over representations of 

h 

^ chemical structure based on various types of features called 
U descriptors. Descriptors include the class of two dimensional 
Q- representations and the class of three dimensional representations. 
15 Two dimensional representations include, for example, standard atom 
pair descriptors, standard topological torsion descriptors, 
standard charge pair descriptors, standard hydrophobic pair 
descriptors, and standard inherent descriptors of properties of the 
atoms themselves. By way of illustration, regarding the atom pair 
20 descriptors, for every pair of atoms in the chemical structure, a 



-3- 



108949-101 PATENT 

descriptor is established or built from the type of atom, some of 
its chemical properties, and its distance from the other atom in 
the pair. 

Three dimensional representations include, for example, 
5 standard descriptors accounting for the geometry of the chemical 
structure of interest, as mentioned above. For instance, geometry 
descriptors take into account a first atom being a short distance 
away in three dimensions from a second atom, although the first 
m atom may be twenty bonds away from the second atom. Topological 
W similarity searches, especially those based on comparing lists of 

pre-computed descriptors, are computationally very inexpensive, 
p The vector space model of chemical similarity involves the 

^ representation of chemical compounds as feature vectors. Exemplary 
Q features include substructure descriptors, such as atom pairs 
15 and/or topological torsions. An example of an atom pair descriptor 
is described by Carhart et al . [1] , and an example of a topological 
torsion descriptor is described by Nilakantan et al . [2] . Atom 
pair descriptors (^'AP") are substructures of the form: 
ATi - (distance) - AT^ 
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where "(distance)" is the distance in bonds between an atom of type 
ATi and an atom of type ATj along the shortest path. Topological 
torsion descriptors ("TT") are of the form: 

ATi - ' ATj - ATk - ATi 
where i, j, k, and I are consecutively bonded and distinct atoms. 
All of the AP's and/or TT's in a compound are counted to form a 
frequency vector. Similarity between two compounds is calculated 
as a function of their vectors. Although there are many standard 
similarity measures, e.g., Euclidean distance, Manhattan distance. 
Dice similarity coefficient, Tanimoto similarity coefficient, and 
cosine association coefficient [31] , each involves the comparison 
of frequencies of matching descriptors in both vectors. However, 
we have determined that , as a consequence , if the probe has few 
descriptors in common with any one compound in the database, the 
search will be met with limited, or no, success. 

Additionally, we have recognized that these searches are often 
more involved when the goal is to select compounds that have 
similar activity or properties, but not obviously similar 
structure. That is, we have identified a need to ascertain, from 
a large collection of chemical structures, compounds, or molecules. 
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a set of diverse chemical structures, for example, that may look 
dissimilar from the original probe compound, but exhibit similar 
chemical or biological activity. We have recognized that although 
algorithms using, for example. Dice- type and/or Tanimoto-type 
coefficients, by design, yield compounds that are most similar to 
the probe compound, such algorithms may fail to provide compounds 
or chemical structures characterized by diversity relative to the 
probe compound. 

With respect to a chemical example, if a particular compound 
were found to be a HIV inhibitor, we have recognized that it would 
be desirable to search a database of chemical compounds or 
compositions for HIV inhibitors that are related to the original 
HIV inhibitor. Specifically, these newly found HIV inhibitors may 
very well be dissimilar to the original HIV inhibitor probe. 
However, we have appreciated that being able to find one or more 
dissimilar HIV inhibitors quickly and effectively can mean billions 
of dollars in revenue resulting from exploitation of the dissimilar 
HIV inhibitors . 
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SUMMARY OF THE INVENTION 

It is, therefore, a feature and advantage of the instant 
invention to provide a method and/or system for selecting chemical 
compounds that have similar biological or chemical activities or 
properties, but not necessarily obviously similar structures. 

It is another feature and advantage of the instant invention 
to provide a method and/or system for ascertaining, from a large 
collection of chemical structures, compounds, or molecules, a set 
of diverse chemical structures, for example, that optionally look 
dissimilar from an original probe compound, but exhibits similar 
chemical or biological activity. A probe compound, for example, 
includes a chemical structure for which related or behaviorally 
similar chemical structures are sought. 

It is an additional feature and advantage of the instant 
invention to provide a methodology for calculating the similarity 
of chemical compounds to chemical probes. The methodology includes 
the following sequential, non-sequential, or sequence independent 
steps. Chemical descriptors for each compound in a collection of 
compounds are generated or created. The descriptors for a given 
compound are represented as a vector of unique descriptor 



108949-101 PATENT 

frequencies. The collection of compound vectors is represented as 
the column vectors of a molecule-descriptor matrix. The singular 
value decomposition of this matrix is performed to produce the 
singular matrices. The chemical descriptors for user probe 

5 compounds are generated or created. The descriptors of probe 
compounds are transformed into the same coordinate system as the 
compounds in the collection, called a pseudo-object using the 
singular matrices. The similarity of transformed probes to the 

ijl compounds in the collection is calculated. A list of the compounds 
lis in the collection ranked by decreasing order of similarity to the 
probe (s) is returned or outputted. 

g Optionally, the step of creating descriptors for compounds in 

^ the collection and probe compounds involves the generation of atom 

Q 

□ pair and topological torsion descriptors from the chemical 
15 connection tables of the compounds. The step of creating 
descriptors for compounds in the collection includes the creation 
of an index of descriptors and an index of compounds in the 
collection. 

Optionally, the molecule-descriptor matrix is denoted as X. 
20 The step of performing the singular value decomposition produces 
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singular matrices as X = PSQ'^ of rank r, and a reduced dimension 
approximation of X defined as Xj^ = PkE^Q^ k<<r, where P and Q are 
the left and right singular matrices representing correlations 
among descriptors and compounds respectively, and E represents the 
5 singular values. The pseudo-object is denoted as Op and is 
calculated from a probe F by = F'^P^E'^. The step of calculating 
the similarity between the pseudo-object Op and the compounds in 
collection is computed by taking the dot product of the normalized 
Ijl vector of Op. with each normalized row of P^, 

So The similarity calculating step includes calculating the 

cosine between the each pair of vectors. The reduced dimensional 

p approximation of X is derived by setting the Jc+1 through r singular 
values of S to zero. The similarities of the pseudo-object to 

;2f compounds is calculated by setting the first k singular values of 

15 S to one. The setting step includes using an identity matrix I. 

It is another feature and advantage of the instant invention 
to provide a method of generating a searchable representation of 
chemical structures. The method includes the following sequential, 
non- sequential , or sequence independent steps. The method includes 

20 generating an index of unique features. The method also includes 
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generating a feature-chemical structure matrix. The method further 
includes determining correlations between chemical structures based 
on the generated feature-chemical structure matrix for generating 
the searchable representation of the chemical structures. 

The index of unique features include chemical descriptors. 
The method includes generating the chemical descriptors from 
connection tables prior to the index-generating step. The 
determining step includes performing singular value decomposition 
of the feature-chemical structure matrix. The chemical descriptors 
include at least one of atom pair descriptors, topological torsion 
descriptors, charge pair descriptors, hydrophobic pair descriptors, 
inherent atom property descriptors, and geometry descriptors. 

It is another feature and advantage of the instant invention 
to provide a computer readable medium including instructions being 
executable by a computer, the instructions instructing the computer 
to generate a searchable representation of chemical structures. 
The instructions include generating an index of unique features. 
The instructions also include generating a feature -chemical 
structure matrix. The instructions further include determining 
correlations between chemical structures based on the generated 
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feature -chemical structure matrix for generating the searchable 
representation of the chemical structures. 

In the computer readable medium, the index of unique features 
include chemical descriptors. The method includes generating the 
chemical descriptors from connection tables prior to the index- 
generating step. The determining step includes performing singular 
value decomposition of the feature-chemical structure matrix. The 
chemical descriptors include at least one of atom pair descriptors, 
topological torsion descriptors, charge pair descriptors, 
hydrophobic pair descriptors, inherent atom property descriptors 
and geometry descriptors. 

The instructions further include determining whether a user 
has input a query compound probe, generating chemical descriptors 
for the query compound probe, calculating similarities between the 
chemical descriptors for the query compound probe and the 
searchable representation of the chemical structures, and ranking 
the chemical structures by similarity to the query compound probe. 
The instructions optionally further include modifying the query 
compound probe based on the generated results for the original 
query compound probe . 
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The challenge of selecting functionally similar, yet 
structurally different compounds from a chemical database can be 
accomplished by using latent structures statistically derived from 
the chemical database. The idea is to exploit these structures or 
correlations among the original chemical descriptors present in the 
database to calculate the similarity between probe compound (s) and 
compounds in the database. This invention, called Latent Semantic 
Structure Indexing or LaSSI, embodies these ideas. 

Ranking compounds to a probe compound using the similarity of 
the reduced dimensional descriptors versus the similarity of the 
original descriptors has several advantages including the 
following. Latent structure matching is more robust than 
descriptor matching, discussed hereinbelow. The choice of the 
number of singular values provides a rational way to vary the 
resolution of the search. Probes created from more than one 
molecule are optionally and advantageously handled. The reduction 
in the dimensionality of the chemical space increases searching 
speed. 

There has thus been outlined, rather broadly, the more 
important features of the invention in order that the detailed 
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description thereof that follows may be better understood, and in 
order that the present contribution to the art may be better 
appreciated. There are, of course, additional features of the 
invention that will be described hereinafter and which will form 
the subject matter of the claims appended hereto. 

In this respect, before explaining at least one embodiment of 
the invention in detail, it is to be understood that the invention 
is not limited in its application to the details of construction 
and to the arrangements of the components set forth in the 
following description or illustrated in the. drawings. The invention 
is capable of other embodiments and of being practiced and carried 
out in various ways. Also, it is to be understood that the 
phraseology and terminology employed herein are for the purpose of 
description and should not be regarded as limiting. 

As such, those skilled in the art will appreciate that the 
conception, upon which this disclosure is based, may readily be 
utilized as a basis for the designing of other structures, methods 
and systems for carrying out the several purposes of the present 
invention. It is important, therefore, that the claims be regarded 
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as including such equivalent constructions insofar as they do not 
depart from the spirit and scope of the present invention. 

Further, the purpose of the foregoing abstract is to enable 
the U.S. Patent and Trademark Office and the public generally, and 
especially the scientists, engineers and practitioners in the art 
who are not familiar with patent or legal terms or phraseology, to 
determine quickly from a cursory inspection the nature and essence 
of the technical disclosure of the application. The abstract is 
neither intended to define the invention of the application, which 
is measured by the claims, nor is it intended to be limiting as to 
the scope of the invention in any way. 

These together with other objects of the invention, along with 
the various features of novelty which characterize the invention, 
are pointed out with particularity in the claims annexed to and 
forming a part of this disclosure. For a better understanding of 
the invention, its operating advantages and the specific objects 
attained by its uses, reference should be had to the accompanying 
drawings and descriptive matter in which there is illustrated 
preferred embodiments of the invention. 
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NOTATIONS AND NOMENCLATURE 

The detailed descriptions which follow may be presented in 
terms of program procedures executed on a computer or network of 
computers. These procedural descriptions and representations are 
the means used by those skilled in the art to most effectively 
convey the substance of their work to others skilled in the art. 

A procedure is here, and generally, conceived to be a self- 
consistent sequence of steps leading to a desired result. These 
steps are those requiring physical manipulations of physical 
quantities. Usually, though not necessarily, these quantities take 
the form of electrical or magnetic signals capable of being stored, 
transferred, combined, compared and otherwise manipulated. It 
proves convenient at times, principally for reasons of common 
usage, to refer to these signals as bits, values, elements, 
symbols, characters, terms, numbers, or the like. It should be 
noted, however, that all of these and similar terms are to be 
associated with the appropriate physical quantities and are merely 
convenient labels applied to these quantities. 
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Further, the manipulations performed are often referred to in 
terms, such as adding or comparing, which are commonly associated 
with mental operations performed by a human operator. No such 
capability of a human operator is necessary, or desirable in most 
cases, in any of the operations described herein which form part of 
the present invention; the operations are machine operations. 
Useful machines for performing the operation of the present 
invention include general purpose digital computers or similar 
devices. 

DESCRIPTION OF THE DRAWINGS 

Figure 1 is a flow chart depicting the processes of creating 
LaSSI databases and handling user probes; 

Figure 2 shows a probe chemical structure and the six most 
similar compounds to that probe by each of the methods as described 
in the illustrative example; 

Figure 3 shows a pair of dendrograms illustrating the self- 
similarity of the 58 compounds as determined by both of the methods 
described in the illustrative example; 



108949-101 PATENT 

Figure 4 is a plot of 58 compounds and the probe in the space 
of the first two singular vectors. The shaded region represents 
that area of space which is within 9° of the probe; 

Figure 5 is a flow chart of another embodiment of the instant 
inventions- 
Figure 6a shows standard probes used in a comparison study; 
Figure 6b shows standard probes used in the comparison study; 
Figure 7 shows probes used for peptide to non-peptide tests; 
Figure 8 is an initial enhancement graph; 

Figure 9 is a graph showing a correlation of rank for the Dice 
and LaSSI methodologies; 

Figure 10 shows selected compounds having different ranks 
according to the Dice and LaSSI methodologies; 

Figure 11 is a graph of a mean similarity of a probe compound 
to each chemical molecule in the top scoring 3 00 compounds; 

Figure 12 is a graph of cumulative actives found versus 
compounds tested; 

Figure 13 shows selected non-peptide compounds having 
different ranks according to the Dice and LaSSI methodologies; 
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Figure 14 is an illustrative embodiment of a computer and 
assorted peripherals; 

Figure 15 is an illustrative embodiment of internal computer 
architecture consistent with the instant invention; and 
5 Figure 16 is an illustrative embodiment of a memory medium. 

DETAILED DESCRIPTION OF THE INVENTION 
A text metaphor is helpful to explain the shortcomings that we 
III recognized in the existing search methods. A search for documents 
about cars from a collection of documents covering a range of 
topics may include a keyword query, such as, '^car.'^ However, a 
g query limited to the word ''car'' will miss documents referring only 
U to "automobile" because "car" and "automobile" are different 
O descriptors and are not identical even though they define the same 
15 object. To uncover the relationship between "car" and 

"automobile," it may be noted that articles referring to cars also 
refer to gasoline, turnpikes, and steering wheels. It may also be 
noted that some or all of these terms are also found in articles 
referring to automobiles. Accordingly, a relationship or a pattern 
of association can be generated between articles referring to cars 
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and those referring to automobiles. Thus, using such a technique, 
a search using a keyword query of "car" would yield articles 
referring to automobiles because it has been established that "car" 
and "automobile" are related. 
5 In view of the above-mentioned shortcomings of existing search 

methods, we noted with interest U.S. Patent No, 4,93 9,853 to 
Deerwester et al . , incorporated herein by reference. This patent 
^ discloses a methodology for retrieving textual data objects, 
m Deerwest er et al . postulates that there is an underlying latent 
JD semantic structure in word usage data that is partially hidden or 
JlJ obscured by the variability of word choice. A statistical approach 
g is utilized to estimate this latent semantic structure and uncover 
the latent meaning. That is, words, the text objects, and the user 
-3 queries are processed to extract this underlying meaning and the 
15 new, latent semantic structure domain is then used to represent and 
retrieve information. However, Deerwester et al . fails to suggest 
any relevance to chemical structures, as neither a recognition of 
the instant need, nor a recognition of a solution thereto is 
addressed. 
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At a high level, the instant invention, which overcomes the 
above-mentioned shortcomings, is described as follows. We have 
determined that a standard mathematical technique called singular 
value decomposition (^'SVD'M facilitates the manipulation of key 
5 words or descriptors, A matrix representing every chemical 
structure, compound, or molecule in a database is generated using 
standard descriptors, as described by way of illustration above. 
^^^^ At least some of the descriptors are correlated. The SVD technique 
ill uncovers these correlations or associations, which are used to rank 
331 the chemical structures, compounds, or molecules . Advantageously, 
* the SVD method provides partial, if not full, credit for 
descriptors that are related, if not equivalent. That is, the 
\1 descriptors need not be direct synonyms. Rather, they are 
p optionally similar or related terms. 

15 . We have discovered that the SVD technique, as applied to a 

chemical context according to the instant invention, ranks highly 
chemical compounds or structures that do not directly appear to be 
similar at a superficial level, but are similar given the 
associations made in the database of chemical structures or 

20 compounds. By way of illustration, many organic compounds are 
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built about carbon rings . In a six-membered ring, for example, 
using atom pair descriptors, not only is there always a carbon atom 
that is one bond away from another carbon atom, but also there is 
a carbon atom that is two bonds away from another carbon atom as 
well as a carbon atom that is three bonds away from another carbon 
atom. In view of this observation, we have recognized that these 
atom pairs are highly associated, although they are not conceptual 
synonyms. We have appreciated that the SVD technique facilitates 
ranking of chemical compounds or structures based on the number 
and/or degree of these associations. 

The description of the inventive method can be further 
understood in the context of an illustrative example. 

Illustrative Example 
To demonstrate the LaSSI method and to expose how it differs 
from standard vector model search techniques, we have created a 
small database of fifty-eight monoterpenes that can be examined in 
detail, as shown in Fig. 2, by way of illustration. Monoterpenes 
are small molecules, for example, ten carbon atoms arranged as two 
isoprene units, produced by plants, ostensibly to attract insects 
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with their distinctive smells. Each compound is represented by a 
data structure called a connection table. Two-dimensional chemical 
descriptors, such as atom pair descriptors, are generated for each 
compound from their respective connection tables. Descriptors 
occurring in more than one compound are used to create an index of 
unique descriptors and a matrix relating descriptors to compounds, 
where the value of element (i,j) of the matrix is the frequency of 
descriptor i in compound j. Table 1 depicts a portion of the 
matrix created for the fifty-eight compounds. 

Table 1. A Portion of the Descriptor-Molecule Matri for the 58 
Monoterpene Examp 1 e 
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APO20O2000 2 0 0 - 0 0 0 

Performing a singular value decomposition of this matrix 
generates fifty-seven non-zero singular values and their 
corresponding singular vectors, or latent structures. The choice 
of the number of latent structures to use directly affects compound 
similarities. Fig. 3 depicts an example of a dendrogram using the 
vectors corresponding to the two largest singular values. The 
compounds form four highly-related groups. , Similarities among 
compounds are shown graphically, by way of example, in Fig. 4 by 
treating the values of the two dimensions as spatial coordinates. 

In Fig. 4, the fifty-eight monoterpenes are represented as 
filled circles. A probe compound, such as 4- t-butylcyclohexanol , 
which smells very much like camphor, but is not a monoterpene and 
is not part of the database, is represented as an open circle. 
Similarity between compounds is then calculated by computing the 
cosine of their position vectors in this two-dimensional space. 
The similarities of the fifty-eight compounds to the probe compound 
can also be easily calculated. The shaded region in Figure 4 
represents that area of space which is within 9° (2.5% of the unit 
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circle) of the probe. Other suitable percentages are acceptable, 
depending on the desired amount of correlation between the database 
compound, and the probe compound. The six most similar 
monoterpenes shown in Figure 2 which fall within this range are 
5 listed in Table 2. 
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Table 2. Six most similar compounds to probe selected by LaSSI 

LaSSI similarity Compound 

0 . 999982 oxypinocamphone 

0.999751 camphor 

0.999702 terpin 

0.999594 3-hydroxycamphor 

a 0.999450 eucalyptol 

III 0.999079 lineatin 

A traditional similarity measure, the Tanimoto similarity 
coefficient, would produce the similarities in Table 3. 

20 
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Table 3. Six most similar compounds to probe selected by Tanimoto 
similarity 



,10 



Tanimoto s imilar i ty 

0 . 532 
0 .435 
0.389 
0,389 
0 .389 
0,361 



Compound 

terpin 

eucalyptol 

menthol 

isoborneol 

borneol 

a-terpinol 



'is 



20 



The advantage of this approach can be seen by comparing the ranks 
of camphor produced by the two approaches. Tanimoto similarity 
ranks 16*^^ (0.282), whereas LaSSl ranks it 2"^ (0.9997 or 1.2''). 
Although the Tanimoto similarity can rank compounds which share 
descriptors with the probe, it has no way of estimating the 
similarity of compounds which do not. LaSSI, on the other hand, 
does not suffer from this limitation. 

Mathematical Background 
The mathematical underpinnings of LaSSI were inspired by 
Latent Semantic Indexing (LSI) , an information retrieval technique 
described in the Deerwester et al . article [4] and U.S. Patent No. 
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4,83 9,853 to Deerwester et al . . both incorporated herein by 
reference. LSI represents a collection of text documents as a 
term- document matrix for the purpose of retrieving documents from 
the collection given a user's query. LaSSI, on the other hand, 
uses a chemical descriptor-molecule matrix to calculate chemical 
similarities. Hence, the nature of the input matrices for LaSSI 
and LSI are very different. The mathematical treatment of these 
matrices, however, is the same. Later we will see that the 
calculation of object similarities made by LSI and LaSSI is 
related, but different. 

LaSSI involves the singular value decomposition of a chemical 
descriptor-molecule matrix, X, where the column vectors of X 
describe each molecule. The SVD technique is well-known in the 
linear algebra literature and has been used in many engineering 
applications including signal and spectral analysis. Here we show 
a novel application of SVD to the problem of chemical similarity. 
For the purpose of this disclosure, the terms descriptors and 
molecules as the rows and columns of X, respectively, will be used 
interchangeably with the more general terms ^'features" and 
"objects". 
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Let the SVD of X in R^^ be defined as X = PSQ^ where P is a 
standard mxr matrix, called the left singular matrix where r is the 
rank of X, and its columns are the eigenvectors of XX^ 
corresponding to nonzero eigenvalues. Q is a nxr matrix, called 
the right singular matrix, whose columns are the eigenvectors of 
X^X corresponding to non-zero eigenvalues, S is a rxr diagonal 
matrix = diagCoi, Os, o^) whose nonzero elements, called 

singular values, are the square roots of the eigenvalues and have 
the property that o^>02>. . .>o^. The i:^^ rank approximation of X, Xj,, 
for Jc < r, Oj^+i.-.a^ set to 0, can be efficiently computed using 
variants of the standard Lasnczos algorithm (Berry, 1996) , Xi, is 
the matrix of rank k which is closest to X in the least squares 
sense and is called a partial SVD of X and is defined as X^=Py,^y,Q\. 

Given the partial SVD of X, similarities between features, 
between objects, and between a feature and an object are computed. 
Furthermore, we compute the similarity of ad hoc query objects, 
such as, column vectors which do not exist in X, to both the 
features and the objects in the database. The similarity of two 
features, Fi and Fj, can be calculated by computing the dot product 
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between the i^^ and j^^ rows of the matrix P;,i7;,. The similarity of 
two objects, Oi and Oj, can be calculated by computing the dot 
product between the i^^ and j*'^ rows of the matrix Qk2\. The 
similarity of a feature, Fi, to an object, Oj , can be calculated by 
computing the dot product between the i^^ row of the matrix Pk2^/\ 
and the j*^^ row of the matrix QkI:'^\. Finally, the similarity of an 
ad hoc query to the features and objects in the databases can be 
calculated by first projecting it into the k-dimensional space of 
the partial SVD and then treating the projection as a "pseudo- 
object" for between and within comparisons. The pseudo-object of 
a query, F, is defined as Op = F^P},E"\. 

Unlike LSI, however, LaSSI need not use the singular values to 
scale the singular vectors. Instead, the identity matrix J is 
used in place of for calculating similarities. This improves 
the system's ability to select functionally similar compounds from 
large chemical databases. 

Methodology 

There are two distinct phases of processing: 1) constructing 
a LaSSI version of a chemical database, and 2) calculating the 
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similarity of probe molecule (s) to the compounds of the LaSSI 
database. The first phase is computationally expensive, however, 
it only needs to be performed once to create the database. The 
second phase, on the other hand, can be accomplished very quickly - 
5 a search of modest database (-10^ compounds) can be performed in, 
for example, under two minutes using a standard computer. This 
section describes the details of both phases. 

'{Z Constructing a LaSSI Database 

§P Generating a LaSSI database includes the following sequential, 

''C non-sequential, or sequence independent steps. A user and/or a 
% computer generates or creates chemical descriptors for each 
compound represented in the database in step SIOO. The user and/or 
□ the computer generates or creates an index relating the columns of 
15 the matrix to the compounds and another index relating the rows of 
the matrix to the chemical descriptors in step SllO. The user 
and/or the computer generates or creates a chemical descriptor- 
molecule matrix representing the compounds in the chemical database 
in step S12 0. The user and/or the computer performs SVD on this 
20 matrix in step S130. 
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The creation of a descriptor-molecule matrix is provided by 
way of example as follows. First, one must decide on how molecules 
are to be represented, i.e., what descriptors are to be used. In 
our experience, two dimensional topological descriptors, such as 
atom pair (AP) and topological torsions (TT) , have worked extremely 
well. We have also experimented with three dimensional geometric 
descriptors, combinations of two dimensional and three dimensional 
descriptors, and biological descriptors, all of which are 
I J acceptable according to the instant invention. However, for ease 
333 of understanding the instant invention, we will restrict our 
discussion of descriptors to only combinations of AP's and TT's. 
AP and TT descriptors are generated from the connection table of 
each compound in a chemical database, A first pass through the 
□ database is performed to create a catalog of unique descriptors and 
15 another catalog of each molecule. Then, a second pass creates a 
list of the frequency of each descriptor found in each molecule. 
Recall that the value of matrix element of X is the frequency 

of descriptor i in molecule j . 

The resulting matrix is used as input for public-domain SVD 
routines which produce the partial SVD of the matrix. We generally 



in 



20 
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select the 1000 largest singular values and vectors for a LaSSI 
database. The database consists of the singular values and right 
and left singular vectors produced by the SVD. 

Querying a LaSSI Database 
Querying a LaSSI database is carried out as follows. A user 
specifies a single compound or multiple compounds as a probe in 
step S200. The connection table of a probe molecule, or multiple 
molecules in the case of a joint probe, is converted to the 
descriptor set of the LaSSI database to create a feature, or 
column, vector for the probe in step S210. A pseudo-object is then 
obtained as described in the mathematics section above for some k, 
specified by the user in step S220. The normalized dot products of 
each molecule, i.e., each row of Pj^, with the pseudo-object are 
calculated in step S230, and the resulting values are sorted in 
descending order in step S240, maintaining the index of the 
molecule responsible for that value. The user is then presented 
with a list of the top ranked molecules cutoff at a user defined 
threshold, e.g., the top 300 or 1000 compounds in step S250. 
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By varying the number of singular values, based at least in 
part on the choice of k, the user controls the level of fuzziness 
of the search. Larger values of k are less fuzzy than smaller 
values thereof. 

Figure 5 shows a flow chart of an alternative embodiment of a 
method consistent with the instant invention. The method includes 
the following sequential, non- sequential , or sequence independent 
steps. In step S30 0, a computer determines whether a user has 
input a query compound probe or query joint probe. If yes, in step 
S3 10, the computer generates chemical descriptors for the query 
compound probe or joint probe. In step S32 0, the computer 
determines whether the user has modified the query in view of the 
generated results. The user can select ranked compounds and add 
them to the original probe and re-execute the search. If yes, flow 
returns to step S310. Otherwise, in step S330, the computer 
transforms the modified query probe into mult i -dimensional space 
using singular value decomposition matrices. In step S340, the 
computer calculates the similarity between the query probe and the 
chemical structures in the compounds database. In step S350, the 
computer ranks the compounds in the compound database by similarity 
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to the query probe. In step S360, the computer outputs a ranked 
list of compounds in a standard manner, for example, via a standard 
computer monitor or via a standard printer. 

5 LaSSl/TOPOSIM Comparison Study 

The following includes results of a series of experiments 
comparing the LaSSI technology to one of Merck's existing screening 
systems, TOPOSIM. During this discussion, TOPOSIM will often be 
ijl referred to by its default similarity metric, in this case ^'Dice^' 
p) similarity. 

Measures of merit for similarity searches 
^ In ^'Chemical Similarity Using Physiochemical Property 

Q Descriptors,'' J. Chem. Inf. Comput , Sci,, 1996, 36, 118-127, 
15 Kearslev et al . [5] , we proposed two measures of efficacy for 
similarity methods . The measures are based on a retrospective 
screening experiment. Imagine a database of N candidates. The 
candidates are ranked in order of decreasing similarity score. The 
candidate most similar to the probe is rank 1, the next rank 2, 
2 0 etc. The candidates are tested" in order of increasing rank and 
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the cumulative number of actives found is monitored as a function 
of candidates tested. The measures are as follows. 

1) A first measure includes testing the number of compounds until 
half the actives are found. We called this number A50. ABO 
can be more usefully expressed as a global enhancement, the 
ratio of the ABO expected for the random case (N/2) over the 
actual ABO . 

2) A second measure includes finding/sending the number of 
actives after testing an arbitrary small fraction of the total 
database. For instance the number of actives at 3 00 compounds 
tested could be called A@300. A@300 is better expressed as an 
initial enhancement: the number of actives in the top ranked 
3 00 compounds (ranked by the method under investigation) 
divided by the number of actives expected if the ranks of the 
actives were randomly assigned in the range 1 to N. 

Diversity 

Our objective is for LaSSI to find a more diverse set of 
actives than TOPOSIM, especially at ranks less than or equal 
to 300; Diverse in the sense that we want to see more actives that 
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are not obvious analogs of the probe. We need a way to measure 
diversity to confirm this. There is an unavoidable circularity in 
comparing similarity methods by a diversity measure since diversity 
itself depends on a particular definition of similarity. Our 
5 resolution of this was to settle on the Dice similarity with the 
topological torsion ("TT") descriptor as a standard. In our 
earlier work, the TT was the least fuzzy descriptor and it has been 
our experience that only close analogs are recognized as very 
similar. One simple diversity measure, which we will call the 

fit) MSP300, is defined as the mean Dice TT similarity of the probe with 
all the molecules in the top 300, not including the probe itself. 
One could do the same with only the actives in .the top 300, but 

U, that would not be as useful because there are many situations where 

O the number of such actives is very small. 

15 

Database used in this study 

To measure the merit of the descriptors we need to have a 
database of molecules for which we know the biological activities. 
For this purpose, we use the MDL Drug Data Report ( ^^MDDR" ) [6], 
20 which is a licensed database of drug-like molecules compiled from 
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the patent literature. We constructed a database of --82,000 
standard molecules from MDDR, Version 98.2. Most structures have 
one or more key words in the ''therapeutic category" field. We will 
assume that a molecule is active as an HIV protease inhibitor, for 
instance, if it contains the key word ''HIV-1 protease inhibitor" in 
this field. There are some unavoidable limitations to using patent 
databases like MDDR, First, since not every compound has been 
tested in every area, one cannot assume that a compound without a 
particular key word is inactive. Thus, there may be some "false 
inactives." An opposite problem is that for some key words, not 
all actives work by the same mechanism as the probe (for instance 
by binding to the same receptor site) and we should not necessarily 
expect all actives to resemble the probe. Thus, there may also be 
some ''false actives." However, comparisons between similarity 
methods should be valid, because for any given probe, the level of 
"noise" is the same for all methods. 
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Choice of example probes for similarity searches 

In this comparison study, we will use two sets of probes. The 
first set is shown in Figures 6a and 6b. Table 4 shows how the 
activities were constructed from key words in MDDR. 



Table 4. Probes and activity keywords used in this study. 



20 
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probe registration nui probe name Activity keywords from MDDR 



standard 



Niimber of 
actives 



090744 
091323 



091342 



091479 
115230 
140603 
144822 
152580 
158611 
161853 



argtroban 
diazepam 



morphine 



f enoterol 

captopril 

losartan 

israpaf ant 

YM-954 

ketotif en 

2-F-NPA 



thrombin inhibitor 
anxiolytic 
benzodiazepine 
benzodiazepine agonist 
analgesic, opioid 
opioid agonist 
kappa agonist 
delta agonist 
mu agonist 

adrenergic (beta) agonist 
ACE inhibitor 
angiotensin II blocker 
PAF antogonist 
muscarinic (Ml) agonist 
antihistaminic 
dopamine (D2) agonist 



493 
3820 



869 



161 

490 

2229 

1240 

858 

616 

127 
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170534 
170958 
187236 
199183 
205402 
221588 



paroxetine 

L-366948 

GR-83074 

indinavir 

montelukast 

tamoxifen 



5HT reuptake inhibitor 
oxytocin antagonist 
neurokinin antagonist 
HIV-1 protease inhibitor 
leukotriene antagonist 
antiestrogen 



219 

176 

150 

641 

1165 

233 



10 



peptide- > 
non-peptide 





159880 


F-DPDPE 


opioid analgesics 


735 


non-peptide 




170958 


L-366948 


oxytocin antagonist 


159 


non-peptide 


M 


174556 


BQ-123 


endothelin antagonist 


488 


non-peptide 


10 


187236 


GR-83074 


neurokinin antagonist 


105 


non-peptide 


f ■ 


188541 


G-4120 


gpIIb/IIIa receptor antagonist 


795 


non-peptide 




cycAI I 


[Sar^Hcy^'Mle^] All 









2;4 The probes and the corresponding therapeutic category in Table 

□ 4 were selected such that the following was true: 



25 



1) the probe itself was typical of a drug-like molecule or at 
least could be considered a plausible ''lead;" 

2) compounds in the same therapeutic category as the probe were 
fairly numerous and diverse; and 
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3) the therapeutic category was fairly specific, so that most of 
the molecules probably work by the same mechanism. 

This was used for what could be considered ''standard" 
5 similarity searching, wherein the idea is to search for actives 
which most resemble the probe. All actives from the MDDR are 
considered. 

The second set of probes is in Figure 7 and Table 4. Similar 
criteria were used to select them, except that these are 

ffp exclusively peptide-like molecules (including two from the first 
set) . A familiar example we wanted to include is angiotension II 

^L^ blockers, but MDDR does not contain a peptide antagonist. We 
therefore took the probe from Spear et al . [7] . These examples are 

□ used to test the ability of LaSSI to select non-peptide actives 

15 given a peptide probe. Therefore not all the actives in MDDR are 
considered, but only the non-peptide ones. There are many possible 
ways to define "non-peptide," but for our purposes we will consider 
a molecule a non-peptide if it does not include the substructure: 
N-Csp3-C(=0) -N-Csp3-C(=0) . 

20 
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RESULTS OF THE COMPARISON STUDY 

Measures of merit for standard similarity searches 

Tables 5a and 5b list measures of merit for Dice relative to 
LaSSI with optimized singular values. The last row of the global 
enhancement table and the initial enhancement table shows the 
enhancement averaged over all of the probes. This number can be 
taken as a qualitative measure of goodness or efficacy of the 
method. 



13^ Table 5a, Measures of merit for Dice and LASSI where the number 

of singular values is optimized. 



20 



25 



Probe/ 
Activity 



090744 

thrombin 

inhibitors 

091323 
anxiolytics 

091342 
opioid 
analgesics 

091479 

adrenergic 

agonists 

115230 

ACE inhibitors 
140603 

All blockers 



Dice LaSSI best Dice LaSSI best Dice LaSSI best 



AP 



AP 



no. 
SV s 
AP 



TT TT 



no. 

SV's 

TT 



55.7 35-8 160 33.7 19.0 290 



1-3 1.1 320 1-5 1.1 20 



2.2 1.6 800 



1-1 3-3 



40 



1.5 28-7 330 27.3 77.3 220 



APTT APTT no . 

SV s 
APTT 

71.6 53.2 170 



1.5 1.1 



1.7 1.7 



220 



470 



9.4 14.6 170 



18.7 14.2 1000 18.1 17.2 650 18.7 17.8 950 

36.7 36.0 100 36.6 35.7 110 36.9 36.1 100 
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144822 2.5 1.7 970 1.4 1.3 260 2.0 1.9 850 

PAF 

antagonists 

152580 12.8 16.1 100 6.3 4.7 20 13.5 14.4 70 

5 muscarinic 
agonists 

158611 2.1 2.3 430 1.4 2.0 260 1.6 2.0 430 

antihistamines 

161853 4.5 7.1 760 4.6 27.5 80 5.9 6.6 800 

1 0 dopamine 
agonists 

170534 3.2 2.0 300 1.6 0.9 170 2.5 2.5 150 

5HT reuptake 

inhibitors 

15 170958 2.8 2.2 100 1.8 3.0 260 2.5 1.7 510 

iIJ oxytocin 

antagonists 

187236 4.3 1.8 90 3.7 2.3 5 4.6 7.1 100 

neurokinin 
■20 antagonist 

;n 199183 22.1 20.4 60 17.2 6.5 260 21.5 10.9 160 

,fi HIV protease 

inhibitors 

G 205402 8.7 7.2 50 6.1 3.2 220 9.2 3.1 420 

:J25 leukotriene 
antagonists 

221588 2.9 4.1 300 2.9 3.1 270 3.7 5.2 650 

antiestrogens 

mean 11.4 11.4 10.3 13.0 12.9 11.2 

30 



35 



40 
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Table 5b. Initial enhancement (@300) optimized singular values 



Probe/ 
Activity 



Dice 
AP 



LaSSi 

AP 



best 
no. 
SV s 
AP 



Dice 
TT 



LaSSI 
TT 



best 
no . 

SV s 
TT 



Dice 
APTT 



LaSSI 
APTT 



best 
no. 
SV s 
APTT 



090744 

thrombin 

inhibitors 



90.2 70.0 IGO 89.1 75.1 290 109.2 83.5 170 



091323 
anxiolytics 

10 091342 
opioid 
analgesics 

091479 
Q, adrenergic 
1§ agonists 

115230 
'-^ ACE inhibitors 



4.7 6.2 320 4.4 4.3 20 



17.5 23.2 800 30.8 26.1 40 



5.7 6.9 220 



30.2 30.2 470 



32.6 34.3 330 44.6 72.1 220 37,7 42.9 170 



34.9 76.1 1000 29.3 47.9 650 34.9 71.6 950 



2^ 



140603 

All blockers 

144822 
PAF 

antagonists 

152580 

muscarinic 

agonists 

158611 

antihistamines 



37.2 37.2 100 37.2 37.2 110 37.2 37.3 100 



23.2 29.6 970 32.1 34,1 260 31.2 32,7 850 



46.0 49.9 100 29.9 36.7 20 



45.1 51.2 70 



30.0 44.8 430 51.6 59.2 260 44.8 50.7 430 



30 



161853 

dopamine 

agonists 

170534 

5HT reuptake 
inhibitors 



17.4 84.8 760 50.0 60.9 80 



34.8 78.3 800 



18,9 18.9 300 5.0 7.6 170 7.6 22.7 150 



35 



170958 

oxytocin 

antagonists 



20.4 23.54 100 21.9 18.8 260 20.4 23.5 510 
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187236 11.0 16.7 90 12.9 14.7 5 12.9 27.6 100 

neurokinin 

antagonist 

199183 55.6 56.0 60 60.3 69.8 260 62.9 58.2 160 

HIV protease 

inhibitors 

205402 37.2 37.9 50 42.9 33.0 220 44.1 35.8 420 

leukotriene 

antagonists 

221588 54.5 51.0 300 53.3 47.4 270 66.4 65.2 650 

antiestrogens 

mean 33.2 41.8 366 37.2 40.3 195 39.1 44.9 388 

±321 ±154 +284 



In Table 5a, no clear superiority of TOPOSIM over LaSSI for 
the global enhancement example is evidenced, and no clear advantage 
to using atom pairs and topological torsions together ( ''APTT" ) 
relative to atom pairs C'AP") and topological torsions ( ''TT" ) 
individually. However, with reference to Table 5b, for initial 
enhancement, we have determined that there is a clear advantage of 
LaSSI over TOPOSIM, We believe that this advantage may result at 
least in part because the number of singular values was adjusted to 
maximize the initial enhancement. We have also recognized a clear 
advantage in using combination descriptors for both Dice and LaSSI. 
The optimum number of singular values for LaSSI varies from as low 
as 5 to 1000 singular values for AP and TT descriptors and from 70 
to 950 for APTT. Henceforth, when comparing Dice and LaSSI, we 
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will consider only the APTT combination since it appears to yield 
the optimum or substantially optimum results. 

In a real example, a user would not know the actives in 
advance. It is therefore important to know how sensitive the 
measures of merit are to the number of singular values. Figure 8 
shows the initial enhancement as a function of number of singular 
values for three examples. The results can be somewhat sensitive 
to the number of singular values and different examples may show 
different sensitivities. If one is to pick a number of singular 
1® values to start with, one might pick 400, a number near 388, the 
mean optimum number of singular values over the examples. Table 6 
compares the measures of merit for the optimized number of singular 
values vs 400 singular values. 



Table 6. Enhancements for the best number of singular values vs 
400 singular values. 

2 0 Probe/ global initial 

Activity enhance enhance 

Dice LaSSI LaSSl Dice LaSSI LaSSl best no. 

APTT APTT APTT APTT APTT APTT SV s 

best 400 best no. 400 SV 

no. SV SV s 

090744 71-6 53.2 6.4 109.2 83.5 57.1 170 

thrombin 

inhibitors 
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091323 1.5 1.1 1.1 5.7 6.9 5.6 220 

anxiolytics 

091342 1.7 1.7 1.3 30.2 30.2 28.0 470 

opioid 
5 analgesics 

091479 9.4 14.6 34.9 37.7 42.9 27.4 170 

adrenergic 

agonists 

115230 18.7 17.8 15.1 34.9 71.6 45.1 950 

10 ACE inhibitors 

140603 36.9 36.1 30.0 37.2 37.3 37.2 100 

All blockers 

144822 2.0 1.9 1.6 31.2 32.7 29.4 850 

PAF 

15 antagonists 

in 152580 13.5 14.4 3.0 45.1 51.2 33.2 70 

I n muscarinic 
P agonists 

hfl 158611 1.6 2,0 1.9 44.8 50.7 50.2 430 

i^P antihistamines 

161853 5.9 6.6 11.6 34.8 78,3 54.4 800 

"'^ dopamine 
'l^ agonists 

170534 2.5 2.5 1.7 7.6 22.7 ,8.8 150 

j2^5 5HT reuptake 

inhibitors 

□ 170958 2.5 1.7 2.1 20.4 23.5 22.0 510 

-rj oxytocin 

antagonists 

30 187236 4.6 7.1 7.8 12.9 27.6 20.3 100 

neurokinin 
antagonist 

199183 21.5 10.9 4.8 62.9 58.2 43.1 160 

HIV protease 
35 inhibitors 

205402 9.2 3.1 3.1 44.1 35.8 35.6 420 

leukotriene 

antagonists 

221588 3.7 5.2 3.0 66.4 65.2 51.0 650 

40 antiestrogens 
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mean 12.9 11.2 8.1 39.1 44.9 34.3 

For about a third of the probes there is a significant 
degradation of the initial enhancement at 400 singular values. 
These are not necessarily the ones where the best number of 
singular values differs the most from 400, however. The 
degradation at 400 singular values is never so bad that LaSSl is 
rendered useless. 

Correlation of ranks between descriptors 

When we compare the ranks of actives by LaSSI and Dice, we see 
that there is little to no correlation for any of the probes. An 
example is shown in Figure 9. The actives are scattered and do not 
fall near the diagonal. LaSSI is clearly selecting very different 
actives than Dice. We can select molecules with strikingly 
different ranks by calculating disparity = log (rank Dice/rank 
LaSSI) . Figure 10 shows examples from three probes where 
abs (disparity) at least 0.5 (the ranks differ by a factor of more 
than --3) and one of the ranks at least 3 00 and the other less than 
or equal to 3 00 . 
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Diversity of actives 

Figure 11 shows the MSP300 as a function of number of singular 
values for three probes. For any given probe, the MSP300 for LaSSI 
is somewhat lower than MSP3 0 0 for the Dice, indicating an extra bit 
of "fuzziness'' provided by LaSSI. For all probes, we have found 
the MSP300 for LaSSI is fairly constant until the number of 
singular values goes below about 20. In other words, for most 
singular values, LaSSI finds different actives than Dice in the top 
3 00, but the diversity of the picks are not very much larger. For 
very low numbers of singular values, there is much more fuzziness 
in the results provided by the LaSSI methodology. 

Selection of non-peptides using a peptide probe 

LaSSI has the potential of finding non-peptide actives given 
a peptide probe. Again we looked at initial enhancement as a 
function of number of singular values, this time taking into 
account only the non-peptide actives. Since the number of actives 
in the top 3 00 tends to be small, there tends to be more than one 
local maximum and other criteria need to be used. We chose as 
"best" the lowest number of singular values where the number of 
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actives was a local maximum, and where the lowest ranking actives 
looked the least peptide-like . Generally the best number of 
singular values is very small (e.g., less than 20). This is 
consistent with the "fuzziness" of LaSSI increasing only at low 
5 numbers of singular values. 

Figure 12 shows the accumulation of non-peptide actives as a 
function of rank for the 18723 6 non-peptide example. Although 
overall the Dice curve is fairly hyperbolic at a large scale, i.e. 
IJI the global enhancement is high, at ranks below a few thousand it 
IB falls below the diagonal. This is because the front of the list is 
highly enriched in peptides of any activity. In other words, to 
Q Dice nearly any peptide resembles a peptide oxytocin antagonist 
M probe more than a non-peptide oxytocin antagonist does. The non- 
''Z peptide actives are displaced to higher ranks, i.e., the initial 
15 enhancement is low. In contrast, on a large scale the LaSSI curve 
tends to drift toward the random line, i.e., the global enhancement 
is low. However, at low ranks the curve falls well above the 
random 1 ine , i.e. , the initial enhancement is high. This is 
typical behavior for the peptide to non-peptide problem. 
2 0 The figures of merit are shown in Table 7. 
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Table 7. Enhancements for peptide probes selecting non-peptide 
active 

Probe Initial Initial Best no. SV s Probability 

enhancement enhancement for LaSSI due to chance 

Dice APTT LaSSI APTT APTT 

159880 0 1.9 2 0.054 

170958 0 2.0 7 1.000 

174556 0 2.7 9 0,003* 

187236 0 9.4 2 0.006* 

188541 0 8.5 15 <0.001* 

cycAII 0 2.1 2 0.005* 

*signif icant 



- Consistent with the behavior of the Dice curves, the initial 
enhancement for Dice is zero, i.e., much worse than random, for all 
peptide probes. The initial enhancements for LaSSI are modest, 
e.g., all less than 10, compared to those for the standard 
similarity probes with LaSSI or Dice , which averages 3 0-40, but 
given the difficulty that Dice has, this is encouraging. When the 
initial enhancements get below --10, it becomes necessary to check 
whether the initial enhancement could have come about by chance. 
For each probe, we generated 1000 control sets wherein the ranks of 
the actives have been randomly assigned. We then see what fraction 
of the control sets have as many or more actives in the top 3 00 as 
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the real search. Taking a probability of 0.05 as the cutoff above 
which the initial enhancement is not due to chance, we see that 
LaSSl does much better than chance for four out of six examples, 
with one near miss. Another type of control is to systematically 
5 assign the wrong activity to the ranked list. For example, we can 
calculate the initial enhancement for the ranked list for 187236 
using the list of angiotensin II blockers instead of the correct 
list of neurokinin antagonists . With the exception of the 170958 
m example, which is clearly not significant, the right activity 

iS always gives a much higher initial enhancement than does any of the 

■ lU 

wrong activities. 

Figure 13 shows the molecules which have the most disparate 
ranks in the significant peptide to non-peptide examples. Clearly, 
O the molecules in this figure resemble drug- like molecules more than 
15 they do oligopeptides. On the other hand, one can pick some 
salient features seen in the peptide probes, although the 
topological distance between the features is not the same in the 
peptide and non-peptide and the exact nature of the groups is 
different . 
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DISCUSSION OF THE COMPARISON STUDY AND THE RESULTS THEREOF 

Similarity searches are the most useful early in a drug- 
discovery project when few actives are known and little is known 
about what features of these molecules confer activity. It has 
been our experience that it is always useful to try different 
methods of calculating similarity, since each has a potentially 
"different" view of chemistry. In the realm of small molecule 
probes, LaSSI certainly selects different actives than does Dice, 
and is thus, a useful complement to TOPOSIM. 

The fact that LaSSI, unlike Dice, has the number of singular 
values as an adjustable parameter adds flexibility but also 
introduces a complication. The goodness of the results can be 
sensitive to this parameter and the optimum number of singular 
values varies unpredictably from problem to problem. Fortunately, 
since LaSSI is so fast to run, it is a trivial matter to run 
several searches at different number of singular values. 

LaSSI has the novel ability to help select non-peptide actives 
given a peptide probe when the number of singular values is low. 
We believe that the range of acceptable singular values for this 
application appears narrow. Most topological similarity methods 
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based on atom-level descriptors have not been able to do this. 
This is basically because the backbone accounts for many of the 
descriptors and therefore dominates the similarity. Also, because 
the active conformation of peptides is often compact, e.g., beta- 

5 turns, the topological distances are often not correlated with the 
through-space distances. By adjusting the number of singular 
values downward, one can set LaSSI so that it captures the 

ri important features of a peptide and "blurs" out the atomic detail, 
ijl including topological distance. 

ij'O^ Having the ability to go from a peptide to non-peptides in a 

,S topological search is very desirable. Often in medicinal 
□ chemistry, an investigator has only peptide leads, but cannot 
\^ develop a drug from it since peptides have poor transport 
properties. He or she needs to find non-peptide actives. The only 
15 way to find them by searching a database has been by 3-D similarity 
methods and/or 3-D substructure searching. However, for 3-D 
similarity it is necessary to construct a three-dimensional model 
of the peptide probe, and requires enough experimental information 
to specify its active conformation. Generating a pharmacophore for 
20 a 3-D substructure search query usually requires several semi-rigid 
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analogs. This type of data is hard to get. Also, 3-D similarity 
methods are a few orders of magnitude slower than topological 
methods. Thus, although LaSSI's ability to find non-peptide 
actives might be modest compared to more expensive methods, there 
is an important application for LaSSI early in a project when 
structural and SAR data is lacking. 

Figure 14 is an illustration of a main central processing unit 
for implementing the computer processing in accordance with a 
computer implemented embodiment of the present invention. The 
procedures described herein are presented in terms of program 
procedures executed on, for example, a computer or network of 
computers . 

Viewed externally in Figure 14, a computer system designated 
by reference numeral 900 has a computer 902 having disk drives 904 
and 906. Disk drive indications 904 and 906 are merely symbolic of 
a number of disk drives which might be accommodated by the computer 
system. Typically, these would include a floppy disk drive 904, a 
hard disk drive (not shown externally) and a CD ROM indicated by 
slot 906. The number and type of drives varies, typically with 
different computer configurations. Disk drives 904 and 906 are in 
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fact optional, and for space considerations, are easily omitted 
from the computer system used in conjunction with the production 
process/apparatus described herein. 

The computer system also has an optional display 908 upon 

5 which information is displayed. In some situations, a keyboard 910 
and a mouse 902 are provided as input devices to interface with the 
central processing unit 902. Then again, for enhanced portability, 

PI the keyboard 910 is either a limited function keyboard or omitted 

ijl in its entirety. In addition, mouse 912 optionally is a touch pad 

"E 

1® control device, or a track ball device, or even omitted in its 
'^if entirety as well. In addition, the computer system also optionally 
includes at least one infrared transmitter and/or infrared received 
U for either transmitting and/or receiving infrared signals, as 
Q described below, 

15 Figure 15 illustrates a block diagram of the internal hardware 

of the computer system 900 of Figure 14. A bus 914 serves as the 
main information highway interconnecting the other components of 
the computer system 900. CPU 916 is the central processing unit of 
the system, performing calculations and logic operations required 

20 to execute a program. Read only memory (ROM) 918 and random access 
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memory (RAM) 92 0 constitute the main memory of the computer. Disk 
controller 922 interfaces one or more disk drives to the system bus 
914. These disk drives are, for example, floppy disk drives such 
as 904, or CD ROM or DVD (digital video disks) drive such as 906, 
5 or internal or external hard drives 924, As indicated previously, 
these various disk drives and disk controllers are optional 
devices . 

A display interface 926 interfaces display 908 and permits 
yi information from the bus 914 to be displayed on the display 908. 

11) Again as indicated, display 908 is also an optional accessory. For 

W 

% example, display 908 could be substituted or omitted. 

Q Communications with external devices, for example, the components 

\^ of the apparatus described herein, occurs utilizing communication 

y port 92 8. For example, optical fibers and/or electrical cables 
15 and/or conductors and/or optical communication (e.g., infrared, and 
the like) and/or wireless communication (e.g., radio frequency 
(RF) , and the like) can be used as the transport medium between the 
external devices and communication port 928. Peripheral interface 
930 interfaces the keyboard 910 and the mouse 912, permitting input 
20 data to be transmitted to the bus 914. 
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In addition to the standard components of the computer, the 
computer also optionally includes an infrared transmitter and/or 
infrared receiver . Infrared transmitters are optionally utilized 
when the computer system is used in conjunction with one or more of 
5 the processing components/stations that transmits/receives data via 
infrared signal transmission. Instead of utilizing an infrared 
transmitter or infrared receiver, the computer system optionally 
jp^. uses a low power radio transmitter and/or a low power radio 
Iff receiver. The low power radio transmitter transmits the signal for 
1;# reception by components of the production process, and receives 
"t^ signals from the components via the low power radio receiver. The 
g low power radio transmitter and/or receiver are standard devices in 
industry. 

y Figure 16 is an illustration of an exemplary memory medium 932 

15 which can be used with disk drives illustrated in Figures 14 and 
15. Typically, memory media such as floppy disks, or a CD ROM, or 
a digital video disk will contain, for example, a multi-byte locale 
for a single byte language and the program information for 
controlling the computer to enable the computer to perform the 
20 functions described herein. Alternatively, ROM 918 and/or RAM 920 
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illustrated in Figures 14 and 15 can also be used to store the 
program information that is used to instruct the central processing 
unit 916 to perform the operations associated with the production 
process . 

5 Although computer system 900 is illustrated having a single 

processor, a single hard disk drive and a single local memory, the 
system 900 is optionally suitably equipped with any multitude or 
combination of processors or storage devices. Computer system 900 
Ijl is, in point of fact, able to be replaced by, or combined with, any 

Wo suitable processing system operative in accordance with the 
*^ principles of the present invention, including sophisticated 
calculators, and hand-held, laptop/notebook, mini, mainframe and 
^ super computers, as well as processing system network combinations 

y of the same . 

15 Conventional processing system architecture is more fully 

discussed in Computer Qrcfanization and Architecture , by William 
Stallings, MacMillan Publishing Co. (3rd ed. 1993) ; conventional 
processing system network design is more fully discussed in Data 
Network Design , by Darren L. Spohn, McGraw-Hill, Inc. (1993), and 

20 conventional data communications is more fully discussed in Data 
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Communications Principles . by R,D. Gitlin, J.F. Hayes and S.B. 
Weinstain, Plenum Press (1992) and in The Irwin Handbook of 
Telecommunications , by James Harry Green, Irwin Professional 
Publishing (2nd ed. 1992) . Each of the foregoing publications is 
5 incorporated herein by reference. Alternatively, the hardware 
configuration is, for example, arranged according to the multiple 
instruction multiple data (MIMD) multiprocessor format for 
additional computing efficiency. The details of this form of 
yl computer architecture are disclosed in greater detail in, for 
IB example, U.S. Patent Mo, 5,163,131; Boxer, A., Where Buses Cannot 

■;.fl 

1^ Go, IEEE Spectrum, February 1995, pp. 41-45; and Barroso, L.A. et 
Q al . , RPM: A Rapid Prototyping Engine for Multiprocessor Systems, 
M IEEE Computer February 1995, pp. 26-34,' all of which are 

incorporated herein by reference, 
15 In alternate preferred embodiments, the above- identified 

processor, and, in particular, CPU 916, may be replaced by or 
combined with any other suitable processing circuits, including 
programmable logic devices, such as PALs (programmable array logic) 
and PLAs (programmable logic arrays) , DSPs (digital signal 
20 processors) , FPGAs (field programmable gate arrays) , ASICs 
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(application specific integrated circuits) , VLSIs (very large scale 
integrated circuits) or the like. 

The many features and advantages of the invention are apparent 
from the detailed specification, and thus, it is intended by the 
appended claims to cover all such features and advantages of the 
invention which fall within the true spirit and scope of the 
invention. Further, since numerous modifications and variations 
will readily occur to those skilled in the art, it is not desired 
to limit the invention to the exact construction and operation 
illustrated and described, and accordingly, all suitable 
modifications and equivalents may be resorted to, falling within 
the scope of the invention. 
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What is claimed is: 

1. A method for calculating the similarity of at least one 
chemical compound to at least one chemical probe, the at least one 
chemical probe including at least another chemical compound, the 
method comprising the steps of: 

(a) creating at least one chemical descriptor for each 
compound in a collection of compounds; 

(b) representing at least one chemical descriptor for each 
compound as at least one vector comprising at least one descriptor 
frequencies ; 

(c) representing the collection of compound the at least one 
vector as a first vector of a molecule-descriptor matrix; 

(d) performing singular value decomposition of the molecule- 
descriptor matrix to produce at least one singular matrix; 

(e) generating at least one chemical probe descriptor for the 
at least one chemical probe; 

(f ) using the at least one singular matrix to transform the at 
least one chemical probe descriptor of the at least one chemical 
probe into a first coordinate system at least substantially similar 
to a second coordinate system of the at least one compound; 
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(g) calculating the similarity of transformed probes to the 
compounds in the collection, and 

(h) outputting a list of at least a subset of compounds in the 
collection ranked in order of similarity to the at least one probe. 

2. The method as recited in claim 1, wherein said step of 
creating at least one descriptor includes generating atom pair and 
topological torsion descriptors from chemical connection tables of 
the collection of compounds. 

3. The method as recited in claim 1, wherein said step of 
creating at least one descriptor includes creating an index of 
descriptors and an index of compounds in the collection. 



15 4. The method as recited in claim 1, wherein said molecule- 

descriptor matrix is denoted as X, 

wherein said step of performing singular value decomposition 
includes generating singular matrices as X = PEQ"^ of rank r, and a 
reduced dimension approximation of X defined as = PkSkQ\ k<<r, 

20 where P and Q are the left and right singular matrices representing 
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correlations among descriptors and compounds respectively, and S 
represents the singular values, 

wherein the at least one produced singular matrix includes a 
pseudo-object denoted as Op and is calculated from a probe F by 
5 Op = F^Pj,E-\ , and 

wherein said step of calculating the similarity between the 
pseudo-object Op and the compounds in collection is computed by 
taking a dot product of a normalized vector of with each 

ill normalized row of P]^. 

M 

5. The method as recited to claim 4, wherein said similarity 
g calculating step includes calculating cosine between each pair of 
vectors . 

15 6. The method as recited in claim 4, wherein said step of 

performing singular value decomposition includes deriving the 
reduced dimensional approximation of X by setting the ic+1 through 
r singular values of E to zero. 
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7. The method as recited in claim 4, wherein similarities of 
the pseudo-object to compounds in the collection is calculated by- 
setting the first k singular values of T, to one. 

5 8. The method as recited in claim 7, wherein said setting 

step includes using an identity matrix I. 

^ 9. A method of generating a searchable representation of 

In chemical structures comprising: 

ffto (a) generating an index of unique features; 

;™ (b) generating a feature-chemical structure matrix including 

fj vectors that describe the chemical structures; and 

(c) determining correlations between chemical structures based 

y on the generated feature-chemical structure matrix for generating 

15 the searchable representation of the chemical structures. 

10. The method according to claim 9, wherein the index of 
unique features include chemical descriptors . 
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11. The method according to claim 9, further comprising 
generating the chemical descriptors from connection tables prior to 
said index-generating step (a) . 



5 12. The method according to claim 9, wherein said determining 

step (c) includes performing singular value decomposition of the 
feature-chemical structure matrix. 

\M 13. The method according to claim 9, wherein the chemical 

iM descriptors include at least one of atom pair descriptors, 

f.xl 

topological torsion descriptors, charge pair descriptors, 
Q hydrophobic pair descriptors, inherent atom property descriptors; 
^ and geometry descriptors. 

15 14 . A computer readable medium including instructions being 

executable by a computer, the instructions instructing the computer 
to generate a searchable representation of chemical structures, the 
instructions comprising : 

(a) generating an index of unique features; 
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(b) generating a feature-chemical structure matrix including 
vectors that describe the chemical structures; and 

(c) determining correlations between chemical structures based 
on the generated feature-chemical structure matrix for generating 
the searchable representation of the chemical structures. 

15. The computer readable medium according to claim 14, 
wherein the index of unique features include chemical descriptors. 

16. The computer readable medium according to claim 14, 
further comprising generating the chemical descriptors from 
connection tables prior to said index-generating step (a) . 

17. The computer readable medium according to claim 14, 
wherein said determining step (c) includes performing singular 
value decomposition of the feature -chemical structure matrix. 

18. The computer readable medium according to claim 14, 
wherein the chemical descriptors include at least one of atom pair 
descriptors, topological torsion descriptors, charge pair 
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descriptors, hydrophobic pair descriptors, inherent atom property 
descriptors; and geometry descriptors. 

19, The computer readable medium according to claim 16, 
wherein the instructions further comprise the steps of: 

determining whether a user has input a query compound probe; 

generating chemical descriptors for the query compound probe; 

calculating similarities between the chemical descriptors for 
the query compound probe and the searchable representation of the 
chemical structures; and 

ranking the chemical structures by similarity to the query 
compound probe. 

20. The computer readable medium according to claim 19, 
wherein the instructions further comprise the step of: 

modifying the query compound probe based on the generated 
chemical descriptors for the query compound probe. 
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ABSTRACT 

A novel extension of the vector space model for computing 
chemical similarity is described. The instant method uses, for 
5 example, the singular value decomposition (SVD) of a 
molecule/chemical descriptor matrix to create a low dimensional 
representation of the original descriptor space. 
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Figure 1. Process Flow Chart 
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Figure 2. Probe and its twelve most similar monoterpenes selected using 2 singular values 
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Figure 3, Dendrograms Showing Similarities For Tanimoto ans LaSSI 
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Figure 4. Two-dimensional Plot of Example Database Compounds and Probe Compound 
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Standard probes used in this study. Each is labeled by the MDDR external registry, its name, and 

associated activity. 
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Figure 8. The initial enhancement for LaSSI APTT vs the number of singular values shown for three 
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is attached hereto OR [] was filed on , as Application Serial No. and was amended on (if applicable). 

LWe hereby state that I have reviewed and understand the contents of the above identified specification, including the claims, as amended 
by any amendment referred to above. 

lAV e acknowledge the duty to disclose information which is known to me to be material to patentability in accordance with Title 37, Code 
of Federal Regulations, Section 1 .56(a). 

I/We hereby claim foreign priority benefits under Title 35, United States Code, Section 119 of any foreign apphcation(s) for patent or 
inventor's certificate listed below and have also identified below any foreign application for patent or inventor's certificate having a filing 
iljdate before that of the application on which priority is claimed: 

illPrior Foreign Application(s): Priority Claimed 

£ Number Country Day/Month/Year filed Yes No 

yl hereby claim the benefit under 35 USC Section 1 19(e) of any United States provisional application(s) Hsted below. 

; gPrior Provisional Appiication($): 
" Application Number Filing Date 

^p60/128,473 04/09/99 

j^^I/We hereby claim the benefit under Title 35, United States Code, Section 120 of any United States application(s) listed below and, insofar 
the subject matter of each of the claims of this application is not disclosed m the prior United States application in the manner provided 
;::3y the first paragraph of Title 35, United States Code, Section 1 12, 1 acknowledge the duty to disclose material information as defined in 
'--Titie 37, Code of Federal Regulations, Section 1.56(a) which occurred between the filmg date of the prior application and the national or 
PCT international filing date of this application: 

Prior U, S, Application(s): 

Serial No. Filing Date Status: Patented. Pending. Abandoned 



I hereby declare that all statements made herein of my own knowledge are true and that all statements made on information and belief are 
believed to be true; and fiirther that these statements were made with the knowledge that willfiil false statements and the like so made are 
punishable by fine or imprisonment, or both, under Section 100 1 of Title 1 8 of the United States Code and that such willful false statements 
may jeopardize the validity of the application or any patent issued thereon. 



The undersigned hereby grant(s) the firm of PEPPER HAMILTON LLP the power to insert on this Declaration any further identification, 
including the appMcation number and filing date, which may be necessary or desirable in order to comply with the rules of the United States 
Patent and Trademark Office for recordation of this document 



I hereby appoint the following attomey(s) and/or agent(s) listed at the following customer number: 

lillli 

21269 

PATENT mom OFFICE 

with full power of substitution and revocation, to prosecute this application and to transact all business in the Patent and Trademark Office 
connected therewith, and all future correspondence should be addressed to the address at the aforementioned customer number, with full 
power of substitution and revocation, to prosecute this application and to transact all business in the Patent and Trademark Office connected 
therewith, and all future correspondence should be addressed to them. 

Full name of inventor: Richard D. Hull 

:/| Inventofs signature: ^^d^jC^ ^LjU Date: ^flpjoO 

'zl Residence: Colts Neck, NJ 
;J1 Citizenship: U.S.A. 

% Post Office Address: 7 Culpeper Key, Colts Neck, NJ 07722 

; # Full name of inventor: Eugene M. Fluder, Jr. 

O Inventor's signature: ^p '^ / Date: "j/^^/pO 

Residence: Hamilton Squ^e, NJ ^_ 

Citizenship: U.S.A. 



Post Office Address: 8 Douglas Ct., Hamilton Square, NJ 08690 

*Full name of inventor: Suresh B. Singh 

Inventor's signature: .^5 ^is j B . ^'^y?-^^ Date: 0^, \ j Q 

Residence: Kendall Park, NJ 

Citizenship: U.S.A. 

Post Office Address: 4 Adams Road, Kendall Park, NJ 08824 



Full name of inventor: Robert P. Sheridan 

Inventor's signature: ^^^^^"-^ Date: ^ /()0 

Residence: Bloomfield, NJ 

Citizenship: U.S.A. 

Post Office Address: 60 Johnson Avenue, Bloomfield, NJ 07003 

^H:************:*:******************** ****** ************************ 

Full name of inventor: Robert B. Nachbar 

Inventor's signature: Date: 

Residence: Washington Crossing, NJ 

Citizenship: U.S.A. 



Post Office Address: 5 Coleman Lane, Washington Crossing, NJ 08560 



^^***********************************************^^:i:5i:******************************^**:i:^*:[:j[:j{::f;*jt;:{;^ 

f ,Full name of sole or first inventor: Simon K. Kearsley 



,g Inventor's signature: 



Residence: Westfield, NJ 



"^^ ^Icr.^ V^>ApaA Date: H-(^l*2JSy^ 



Citizenship: United Kingdom 



Post Office Address: 726 Coleman Place, Westfield, NJ 07090 



::::DC: #144809 vl (33QH01!,WPD) 



